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PREFACE 


Most  of  the  work  described  here  was  done  in  the  summer  of  1965  as  part 
of  our  continuing  research  in  the  general  area  of  machine  problem  solving. 
In  the  fall  of  1965,  a  (limited  distribution)  report  covering  this  work 
was  issued  at  RCA  Laboratories  and  at  the  Mental  Health  Research  Institute 
of  the  University  of  Michigan.  Plans  to  revise  this  report  and  to  combine 
it  with  subsequent  results  of  our  work  on  representations  in  question¬ 
answering  systems  have  delayed  its  wider  distribution.  However,  since  the 
work  described  in  the  report  is  relevant  to  much  current  research  on 
graphic  languages  and  question-answering,  we  have  decided  to  extend  its 
availability  to  the  technical  community  in  its  present  form,  and  to  issue 
it  as  a  technical  report. 

Saul  Amarel 
Princeton,  N.  J. 

May  1968 
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ABSTRACT 


In  this  paper  we  discuss  means  of  representing  states  of  the  world 
which  are  easily  described  as  pictures  of  triangles,  circles,  and  squares 
in  horizontal,  vertical,  or  enclosure  relationships;  our  study  is  oriented 
to  the  comparative  evaluation  of  different  representations  for  computer- 
based  question-answering  systems. 

Three  languages  for  representing  such  pictorial  data  are  constructed. 
The  basic  units  of  the  first  are  pictures,  of  the  second  trees,  and  of  the 
third  sentences.  Each  of  the  three  languages  is  further  modified  to  serve 
for  describing  data,  for  specifying  constructions,  for  posing  queries,  and 
for  stating  answers.  The  interrelations  among  the  various  specialized  uses 
of  these  three  languages  are  investigated.  Queries  are  best  posed  in  an 
English-like  language,  computer  search  best  proceeds  or.  data  represented  as 
trees,  and  answers  can  often  be  best  presented  in  picture  representations. 
Results  are  in  the  form  of  a)  context-free  generative  grammars  for  the 
different  languages  expressed  as  production  rules,  b)  theorems  showing 
correspondences  between,  say,  all  query  sentences  and  all  pictorial  answers, 
and  c)  formula  for  the  effort  to  search  for  answers,  for  optimal  trees  to 
store  data. 
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I.  INTRODUCTION 


Since  the  potential  of  computers  for  uun-ari time  tic  processes  has 
been  r ecogn i zed,  the  problems  of  pictorial  and  linguistic  data  processing 
have  attracted  increasing  interest.  One  way  to  increase  the  sophistication 
of  the  computer  art  in  this  direction  is  to  present  the  computer  with  data 
in  pictorial  form  and  interrogate  it  in  restricted  English.  To  answer  queries, 
the  machine  should  be  able  to  search  its  internal  memory  for  data  responsive 
to  the  query.  This  involves  capability  of  processing  at  least  3  different 
languages:  pictorial  representation,  representation  as  sentences  in  English, 

representation  suitable  for  updating  and  searching  the  machine's  memory. 

A  user  may  require  the  machine  to  construct,  search,  describe,  or 
interrogate  a  given  body  of  data;  he  may  state  his  requirement  in  any  one 
of  the  three  languages;  he  may  have  fed  the  data  into  the  machine  in  any  one 
of  them;  and  he  may  wish  the  response  in  any  one  of  them.  It  is,  therefore, 
of  interest  to  formally  study  the  relationship  between  these  three  languages. 
Can  anything  represented  iri  one  language  also  be  represented  in  the  other 
two?  Can  queries  asked  in  one  language  be  answered  in  another?  What  are  the 
relative  merits  of  these  different  languages  for  various  purposes? 

These  questions  arc  of  some  interest  in  themselves,  though  the  answers 
arc  obvious  for  the  highly  restricted  domain  of  discourse  considered  here. 

The  techniques  of  answering  them  can,  however,  be  extended  as  the  domain  of 
discourse  is  extended  and  as  the  languages  arc  enriched.  Both  the  techniques 
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and  the  answers  are  useful  in  investigating  the  informational  equivalence 
of  two  descriptions  (c.g.,  if  both  lead  to  the  same  construction  specifi¬ 
cations),  the  relevance  of  answers  to  queries,  in  assessing  the  choice  of 
different  means  of  representation  available  to  the  designer  of  an  information 
system. 


Similar  problems  were  studied  by  Kirsch^^,  Simnona  and  Londe^^, 

[9] 

Sutherland  to  mention  a  few.  The  idea  of  using  our  restricted  domain 
of  discourse  was  c  hy  S.  Amarel  (private  communication)  and  mentioned 

by  M.  Minsky^^.  The  use  of  trees  for  storage  and  search  has  been  studied 

in  more  detail  hy  S.  Amarel^  '  ^  and  R.  McNaughton  as  an  application  cf  ir.ulti- 

[6] 

computer  systems 

We  have  not  studied  questions  of  translating  from  these  languages  into 

r  2  ]  r  /,  i 

the  predicate,  calculus,  as  being  done,  for  example,  by  Bo!  .ert  ,  Cooper1  , 
Darlington^  .  Nor  have  we  addressed  ourselves  to  the  important  problem  of 
how  to  get  a  machine  to  select  significant  conjectures,  to  pose  deep  questions, 
of  condensing  or  summarizing  information^.  We  ill  touch  upon  this  problem 

in  a  forthc  wnir3  paper  on  methods  for  translating  query  sentences  into 
computer  search  prosiamf. 
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II.  A  PICT0R7M  LANGUAGE 


By  •  language  L  we  mean  the  set  of  all  possible  sentences  generated 
by  a  linguistic  system  S(L).  A  linguistic  system  consists  of  a  quadruple 

<VT,  VN,  U,  R) 

in  which:  denotes  the  terminal  vocabulary,  which,  for  a  pictorial  language, 

consists  of  geometric  objects  such  as  O,  O,  A  ;  denotes  the  non¬ 
terminal  vocabulary,  which,  for  a  pictorial  language,  consists  of  configu¬ 
rations;  U  denotes  a  special  element  in  corresponding  to  the  unit  of  the 
language,  a  complete  graphic  message  or  picture  in  a  pictorial  language;  R 
denotes  a  set  of  production  rules  to  be  illustrated  next  for  a  pictorial 


language. 


The  first  of  these  rules  for  '.he  pictorial  linguistic  system  asserts 


that  U  consists  of: 


or:  Rule  1: 


I - 1 

'  i 

I 


Rule  1  is  understood  as  follows.  A  picture  (corresponding  to  "sentence  ) 
consists  of  a  double-line  frame.  Inside  the  frame  is  a  "configuration",  a 
rectangle  with  dotted  lines.  Unless  otherwise  indicated,  the  figures  with 
dotted  lines  can  be  located  anywhere  inside  the  frame  and  have  any  sire. 

Rule  1  states  that  one  figure  I  |  can  be  replaced  by  or  produced  from 

l_.  J  1* 

*  The  rules  of  replacement  are  formulated  here  in  the  reverse  order  to  that 
conventionally  used  for  phrase  structure  languages;  i.e.  the  replacements 
here  are  "frro  specific  to  general"  in  contrast  to  replacements  "fr.-m 
general  to  specific"  that  are  common  in  phrase  structure  grimr  frrmu- 
lations . 


i 


The  second  rule  specifies  hew  configurations  may  be  formed. 


r--»  r~ ■> 
li  i  \  i  I 

I* — •  J  i 

I _ ) 


(2b)| 


•  r-i  • 

ill* 

,  u.j  i 

'I  r~  i  I 

,  «  i  | 


I  (2c) 


r~i 
1  • 
u.  J  I 


(2d)  j  (g 


I  (2«)i 


Unless  otherwise  indicated,,  the  figures  in  solid  lines  can  be  located  anywhere 

and  have  any  size  provided  their  edges  do  not  intersect  any  other  edges.  Rule 

f"l  {"  "J 

(2b),  for  example,  states  that  any  pairj:i-t-t  can  be  replaced  by  i  i  . 

I J  1 -1 

Generally,  the  rules  assert  that  whenever  there  is  a  figure  having  the  form 
that  remains  when  the  outer  dotted  line  is  removed,  that  figure  can  be 
replaced  by  the  corresponding  outer  square  of  dotted  lines. 

Rule  3  relates  configurations  to  specific  objects: 


i  -  I 

I  i 

i  i 

;  i 

I _ I 


ob)  ; 


Rule  (3c)  asserts  that  a  triangle  of  any  si*c  can  be  replaced  by  |  j  . 
To  verify  that  A  A  is  a  picture,  we  apply  Rule  (3c)  to  the  left  triangle 

and  form:  ,  "j  .  We  also  apply  (3c)  again  to  the  right  triangle  to  get 

~f~  'it — *  ,  • 

altogether:  J  jj  |  .  We  now  apply  Rule  (2a)  to  get  |  ! .  We  now  apply 


Rule  1  to  get 


,  and  this  terminates  the  verification  that  we  have  a 


picture.  We  can  repeat  this  procedure  without  removing  the  figures  being 
replaced,  and  numbering  the  order  in  which  the  rules  were  applied. 


We  take  a  more  complex  example 


The  terminal  vocabulary  consists  of  all  squares,  circles  and 
equilateral  triangles.  The  "non-terminal  vocabulary"  consists  of  all 


r~->  t-'  *  , 

rectangles  made  of  dotted  lines,  all  figure;  like^  J  [  J,  like  k 


i _ 1 

, - -  , 


and  of  all  rectangles  vi 


double  edges.  The  latter  plays  che  roles  of  a  picture-designator  l’.  The 
rules  R  all  consist  of  a  figure  of  enclosing  a  figure  cf  cither  r 
stating  that  the  enclosed  figure  can  be  replaced  by  the  enclosing  figure. 


*  This  corresponds  to  the  relation  expressed  by  "below",  used  at  the  end 
of  this  section. 


The  accompanying  figure  is  not  a  picture  according  to  the  above 

rules.  Note  also  that  the  picture 


O 


00 


could  not  be  distinguished  from  either 

.  Rules  1-3  specify 


or 


O 

o 


a  particular  pictorial  description  language 
Lgp,  which  excludes  many  figures  we  would 

realistically  like  to  call  "pictures'’.  By  augmenting  V  V  and  R  of  the 
system  S(L^)J  we  can  enable  to  better  approximate  reality.  But  this 
is  not  our  primary  intent  here.  In  the  language  L^D  specified  by  the  above 
V  V  R  there  is  no  limit  to  the  number  of  geometric  objects  which  may  be 
in  one  picture. 

By  a  construction  language  we  mean  the  set  of  all  possible  imperative 

"sentences"  generated  by  a  linguistic  system.  Each  imperative  "sentence"  is 

a  command  specifying  a  construction.  In  the  case  of  a  pictorial  construction 

language  LGO  each  "sentence"  is  in  '  form  corresponding  to  a  "parsed"  picture. 

* 

The  rules  Rqq  ^re  the  same  as  *GD'  except  that  they  require  the  interior  of 
a  dotted-line  square  to  replace  the  outer  boundary  rather  than  vice  versa. 

Only  Rule  1  differs  in  that  it  uses  a  wavy  line  instead  of  a  double  line 
boundary . 

Each  rule  of  specifies  a  construction  step.  It  consists  of 
erasing  dotted  lines  in  a  surrounding  rectangle,  or  replacing  the  wavy  line 
by  a  double  line  frame,  or  terminating  when  the  objects  of  Vj  are  reached. 

Both  V^,  and  VN  are  the  same  for  as  they  were  for  Lrn. 


N 


GC 


CD 
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As  an  example,  suppose  we  wish  to  represent  an  order  tc*  construct 


(copy)  the  picture 


.  The  order  would  £irst  be  described  in  as 


follows:  I 'A'  A*  i  .  The  following  figure 

Ci-J-'-jtLj 


Hai  af!' 

I  '  *— •  I 


is  the 


corresponding  "imperative  sentence"  of  LGC*  To  check  that  this  is  a  well- 

formed  member  of  L„„  we  proceed  by:  1)  applying  the  rule  I  T  If  1  I  to 

.  Lt-.-'l'U  . 

get  J~  i  ■”  "1  i  2)  applying  the  rule  I  A  l  to  the  pair  to  get  A  A  , 

i _ J  i — i  i _ J 

3)  replacing  the  wavy  line  of  the  frame  by  the  double  line.  It  is' important 
to  keep  in  mind  the  distinction  between  rules  legitimizing  the  form  cf  con¬ 


struction  orders  and  rules  for  executing  such  orders. 


By  a  query  language  1^  we  mean  the  set  of  all  possible  interrogative 
"sentences"  generated  by  a  system  S(I^)  “  (vnq>  vtq>  Uq>  *q) •  Earh  <luery 
"sentence"  states  what  is  wanted  and  what  is  known.  It  always  makes  implicit 


reference  to  a  corpus  to  be  searched.  The  corpus  is  a  subset  of  not  l>Gi) 
itself.  In  a  graphic  query  language,  the  queries  are  again  in  a  form 
corresponding  to  pictures.  Instead  of  the  double-line  frame  which  denoted 
a  picture  to  be  analyzed  in  or  a  wavy-line  frame  which  denoted  a  picture 
to  be  copied  in  we  use  a  dot-dash-line  frame  to  denote  a  pictorial  query. 
Question-marks  are  in  the  place  of  objects  in  V^..  Answers  consist  of  pirtures 
in  L  n  with  all  question-marks  replaced  by  object?  in  V_.  The  linguistic 
system  for  is  the  same  as  that  for  lgd  except  that:  in  Rule  1,  the 

double-line  frame  is  replaced  by  a  line  like  -  ;  the  object  ?  is 

added  to  Vj.  The  rules  are  unchanged.  The  following  figure  sh:-ws  the  order 
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I  fmcvjgr 


of  the  steps  and  the  rules  needed  to  verify  that  it  is  a  pictorial  query. 


3  (2a)  *^1  T 

PH 


2  (3c)  — r 


I - 1 

'  A  ^ 

-i  ^  i 

i _ i 


r-  "U 

I  n  i 
l  *  < 
l - ! 


L _ J 


I  (5) 


A  (I") 


P7I 

Here*  (5)  is:  i  ‘  {  and  ( 1**)  is  •  i  »*  ,  which  is  U. 

„  hr.-jJ 

By  a  processing  or  search  or  answer  language  L^,  we  mean  the  set  of 
all  answer-sentences  generated  by  a  system  S(L^).  A  pictorial  answer 
sentence  is  a  picture  of  in  which  the  object  replacing  ?  in  a  pictorial 
query  of  is  surrounded  by  a  frame  like  3  £•  The  outer  frame  is 

^  'Vrrrf 

similarly  replaced.  Thus,  -j  a  ST/vt  >  is  an  answer-sentence.  The  system 
S(I^)  is  the  same  as  that  for  except  that  J  Q  ?.  ,  3  O  r  and 

Al  are  added  to  and  3  corresponding  rules  are  added  to  rule  3,  and 

'iO.'r 

rule  1  is  replaced  by  ( 1  *  *  * ) . 


r  w  c. 

To  verify  that  2  A  ris  an  answer_6entence  °f  proceed  in 

the  order  shown  in  the  accompanying  figure. 

iff 

It  is  also  possible  for  both  an  object  (e.g..  A)  and  ?  to  appear  in  a  box. 

If” “7rl“~i  I 

e.g.,  •  i  .At  i  A  ?i  •  •  •  This  is  also  a  pictorial  query.  It  asks  for 

verification  of  the  figure.  Both  rules  (3c)  and  (5)  apply  to  the  right 
inner  box.  Except  for  the  combination  of  1  with  objects,  only  one 


ULcUHCctt.. 


rule  can  apply  to  a  dotted-line  box. 
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We  summarize  the  four  Aspects  of  our  pictorial  language  system 


below: 

Terminal  Non- terminal  Unit 

Vocabulary  Vocabulary  of  the  Rules  of  Formation 

y  y  Language 

T  N  UR  (for  verifying  membership  in  L) 


Consider  Che  following  4  examples: 


jrntiuiuniH  Hins 

iA.Sj 

>tiuouu«oatti 


To  verify  chat: 


III.  ANSWERING  QUESTIONS  AND  EXECUTING  CONSTRUCTION 
ORDERS  GIVEN  IN  A  PICTORIAL  LANGUAGE 

We  now  provide  a  totally  different  set  of  rules  for: 


>llcable  Procedure 


Executing  Construction: 

Cl.  [  J 

a 

e.g.,  Given  C,  apply  Cl, 
C3c,  to  produce  D, 

of  lgd 

C2a ,  C3c, 
the  unit 

! 

C2a.  n  T‘1 

L -J U.J 

r  r~~l  ~ r~-P 

M  '  '  i  1 
lL-J  l.jj 

similarly  for  C2b-C2e;  reverse 
arrow  of  2a-2e. 


Checking  that  result  of  con¬ 
struction  is  as  specified. 

By  a  parsing  tree  we  mean 


a  tree  such  as 


for  D  above 


2a 

/  \ 

3c  3c 


Answering  a  query.  fP  — — - 

Given:  iljwJ.Ljj 


1" 

D  2. 


Given: 


(a  a] 


•  □  *  o 


and  similarly  for 
C3b,  C3c  and  C6 a, 
C6b,  C6c. 


(Ch  1)  Parse  result  using  (1)-(3c),  i.e 
record  the  rules  and  order  in 
which  they  are  used,  deleting 
rule  (1). 


<Ch2) 


<Ch3) 


Form  {  and  search  all  such 
2a  trees . 

/  \ 

3c  3c 

This  tree  matches  the  above  except 
that  3c  is  in  place  of  5.  Fora 
C6c  which  is 


I - l 

i  I 

I  I 

I _ J 


Delete  rules  (1')  and  (4)  from 
parsing  of  constructs  n  state¬ 
ment  (e.g.,C). 

Check  that  parsing  of  result  “ 
parsing  of  construction  state¬ 
ment  to  vj'ithin  partial  order. 


□ 


I - 1 

| _ I 


(Q2)  Form  parsing  tree  for  query; 
delete  rule  (1"). 

(Q3)  Search  all  parsing  trees  in  a 
corpus  for  pictures  with  (1) 
deleted,  until  one  is  found  which 
matches  that  of  query  except  for 
rule  (5). 

(Q4)  Suppose  rule  3x,  x=a.  b,  c  holds 
in  pla  cc  of  rule  (5).  Fonn  rule 
C6x  • 

(Q5)  Put  C  in  front  of  all  rules  used 
in  query,  with  C6x  replacing 
rule  (5). 


Apply  Rules 


C3a,  C3c,  C6c  e.g.  applying  C6c 
yields 


Form 


Form 


(Q6)  Form  a  construction  statement. 
£rom  all  the  rules  used  in  the 
query  by  combining  the  dotted 
lines  on  the  left  side  of  a  rule 
with  what  they  replace. 

(Q7)  Execute  the  above  construction 
rules. 


(Q8) 


Note  that  all  the  rules  of  formation  for  the  pictorial  language  we 
have  written  are  in  the  form  of  production  rules.  We  have  generalized  from 
the  conventional  notion  of  concatenation  used  in  linguistics  --  which  means 
placing  two  one-dimensional  strings  next  to  each  other  --to  where  it  can 
also  mean  adjoining  two-dimensional  arrays  into  horizontal,  vertical,  or 
enclosure  adjacency  relations.  This  casts  our  pictorial  language  clearly 
into  the  class  of  context-free  languages,  with  our  extended  interpretation 
of  concatenation.  All  the  notions  and  results,  including  the  problems  of 
structural  ambiguity,  apply  to  our  language.  The  following  results  all 
derived  from  this. 

Theorem  3.1:  To  every  picture  in  corresponds  at  least  one  pictorial 

construction  statement  in  LCC* 

Proof :  Given  a  figure  like  D,  identify  the  elements  in  in  it.  Surround 

each  by  a  square  of  dotted  lines.  Then  apply  the  rules  (Q6)  which  were  used 
in  forming  a  constructing  statement  plus  the  rule 

It  is  easily  verified  tl*t  the  result  is  an  element  of  lgc-  Because  there 
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ia>  in  general,  a  choice  in  which  rules  (Q6)  can  be  applied,  more  than  one  con¬ 


struction  statement  corresponds  to  a  given  picture_.  For  example, _if  the  picture 
fr^FTl  I  !  1 1  I  ”1  Fin!  « — •  **  -» 


is 


00 
o 


,  and  we  have  rules:  (C'2b):  1  tj  '  * 


(C’2a) 


and  (C'3a):  fo"* 
» _ j 


we  get  both 


♦  I  H  i 

LJU 

depending 


on  whether  we  apply  the  rules  (C*3a)  3  times,  (C*2b),  (C*2a)  or  (C'3a)  3  times, 
(C'2a),  (C’2b).  If  the  given  figure  is  not  an  element  of  C^_,  rules  (C'2a) 
and  (C'2b)  will  not  apply  because  rules  2a  and  2b  do  not  apply. 

Theorem  3.2:  To  every  pictorial  construction  statement  in  corresponds 

a  unique  picture  in  L^,. 

Proof:  Given  any  element  of  L^,,  supply  rules  Cl,  C2a,  C2e,  C3b,  C3b,  C3c, 

where  applicable  in  that  order.  Apply  Rules  (Ch  1)  (Ch  2)  (Ch  3)  to  verify 
that  the  result  of  the  construction  is  as  specified.  Since  applying  Rule 
Ch  1  involves  parsing  the  result  by  the  rules  of  S(LGJ)),  this  verifies  that 
the  result  of  construction  is  in  lcd-  The  order  of  applying  rules  Cl-C3c  is 
specified.  (Applying  these  rules  is  tantamount  to  erasing  the  dotted  lines, 


from  the  outside  in);  Hence,  the  resulting  picture  is  unique. 

Theorem  3-3:  To  every  picture  D  in  which  contains  n  objects  o£  V^. 

correspond  *t  least  2°  queries  in  each  having  answer  in 
corresponding  to  D. 

llpo.t-  Given  a  picture  K,  we  form  a  query  by:  1)  parsing  D  according  to 

the  rules  of  S(1^D)  leaving  in  all  dotted  lines  with  what  replaces  them; 

.  — | —  *  — | 

2)  replacing  j  byj^  _ j  3)  replacing  either  one,  two,  three,  cr  all  n, 

of  the  objects  in  D  by  T.  Step  3  can  be  done  in 


ways.  For  each  way,  there  is  at  least  one  patting  of  P.  To  verify  that  the 
result  of  this  construction  is  in  L^,  apply  the  rules  of  5(1^).  Applying 
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MS jfc<  .-•*  f>.  ,---.>■  -v  -<n. —  „ .r-rt-*:*  »-•  jy, , ' *mt**s. v 


steps  (Q1)*(Q8)  results  In  an  element  of  .  This  is  verified  by  applying 

the  rules  of  S(L^)  .  To  check  that  this  is  an  answer,  replace  the  outer  frame 
according  to  |[  ||  -►  a  and  delete  ^ on  the  inside  wherever  it 

occurs.  The  result  is  identical  with  D. 

Theorem  3.4:  Consider  any  query  Q  in  with  an  associated  corpus  Corp  (Q), 

which  is  a  finite  subset  of  L^,  containing  m  pictures.  Suppose  that  a 
fraction  f  of  the  m  pictures  correspond  to  answers  for  Q,  each  ,f  the  m 
pictures  having  the  same  probability  of  corresponding  to  an  answer.  If 
f  >  0,  it  will  take,  on  the  average,  f/2[m2  +  3m  *  f(2m2  +  3m)  +  f2m2  2] 
search  comparisons  to  find  an  answer.  If  f  *  0,  it  will  take  m  compari- 
sons  to  ascertain  that  Q  has  the  answer:  "Query  specifications  are  not 
met." 

Proof:  To  answer  Q,  execute  steps  (Q1)-(Q8)  to  produce  a  pictorial  answer. 

To  check  that  this  is  in  and  is  an  answer  proceed  as  in  the  proof  of  Th.  3. 
Because  the  corpus  to  be  searched  in  step  (Q3)  is  finite,  the  procedure 
will  terminate  in  a  finite  nunber  of  steps.  If,  in  step  (Q3),  the  parsing 
tree  of  Q  does  not  match  any  tree  in  the  corpus,  all  m  such  trees  will  have 
been  checked  to  ascertain  that  the  query  specifications  are  n; t  met.  If  there 
are  fra  pictures  in  the  corpus  which  correspond  to  an  answer,  then  the  proba¬ 
bility  that  the  it^1  picture  in  some  ordering  of  the  m  pictures  cf  th«.  corpus 
is  an  answer  is  .  If  we  examine  all  the  pictures  of  the  corpus  in  order, 
this  is  the  probability  of  stopping  at  the  il\  and  th«-  expected  number  cf 
pictures  examined  before  the  first  match  is  Li*f.  The  sun  goes  from  i*l  to 
i»nc-fm  4-  1,  because  in  the  extreme  case  all  the  fm  answers  are  in  a  row  at 
the  end.  The  sun  is  f/2[(m-fm  +  l)(m-fm  +2)),  which  is  t /2(ra2  f  3»- f  ( 2ra2-*  Ju:) 

+  f 2m2  +  2 ) • 
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If  Q  has  a  tingle  question-mark,  the  corpus  could  contain  at  most  3 

answers,  corresponding  to  □  ,  O,  or  A  in  place  of  ?.  If  Q  has  k  question 

)c 

marks,  the  corpus  could  contain  at  most  3  answers.  Let  g(m)  be  the  fraction 

k 

of  "possiblt"  answers  which  arc  in  the  corpus.  Thus,  fm  g(m)'3  .  If,  for 
example,  g(m)  K  ,  then  the  expected  mmbe-r  of  searches  increased  with  m 
and  k  approximately  as 

m2  +  3m  .k  ?m2  +  3m  k 

2<^>  ‘  2(Wm)2  '8 


If  Corp  (Q)  is  not  finite,  the  answer-procedure  may  not  terminate 
in  a  finite  number  of  steps,  depending  on  the  decidability  of  Q. 

We  conclude  by  illustrating  a  pictorial  query  which  corresponds  to 
the  question:  "In  a  given  set  of  pictures  which  are  in  a  given  corpus,  la 
it  true  that  each  circle  which  is  to  the  left  of  a  square  is  below  a 
triangle?" 

Boxes  marked  by  X  can  have  any  of  the  three  objects  of  VT  inside. 
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IV.  A  TREE  LANGUAGE 


The  terminal  vocabulary  of  the  tree-language  system  which  we  will 

\  \  X 

,  >Sh 


carry  as  our  running  illustration,  contains:  ^S1 ^Sh  ,  ^ 


/ 


\ 


Sh 

/  Ni 


(Sh  is  mnemonic  for  "Shape"),  These  labeled  trees  correspond  to  O  ,  C  ,  and 

A  in  the  terminal  vocabulary  of  the  pictorial  language  system.  We  use  them 

so  that  the  tree- language  consists  exclusively  of  trees,  just  as  the  pictorial 

language  consisted  exclusively  of  diagrams.  It  would  be  Just  as  well  to  use 

O,  □  ,  A  in  place  of  these  trees,  as  terminal  nodes.  To  m  tivate  the  use  :f 

Sh 

these  particular  trees,  we  read  /  as  stating  "each  pclyg'n  in  configu* 

x  y 

ration  x  has  more  corners  than  any  polygon  in  configuration  y.''  A  dot.  as  in 
iJh 

*  denotes  a  specific  object. 

The  non-terminal  vocabulary  of  S(L^)  contains  labeled  trees  like 

We  state  next  some  of  the  ruK, 

for  S(L^) . 

Rule  T2 : 


X 


A 


A  ’X 


This  rule  states  that  any  tree  with  H,  1,  V  or  Sh  in  place  r  f  the 

circle  can  be  replaced  by  I  . 

/\ 


Rule  T3a: 


X 


/X  A 

The  right-hand  side  of  this  rule  designates  a  pair  cf  trees  bth 
Joined  at  the  same  node  above  then,  as  .  As  hef  re,  0  stands  f. r 


H,  I,  V  or  Sh.  The  left-hand  tree  is  any  tree  with  H,  V  hut.  net  Sh  c  r  I,  in 


To  facilitate  reading,  we  remind  the  reader  that  I  indicates  "end  'sure  , 
H  "to  the  left /right  of",  V  "abovo/bel^w",  Sh  ’’Shape",  C  a  variable  f  r 
H,  I,  V  or  Sh;  •  a  variable  for  just  H  or  V. 
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> 


bottom  of  Fig.  4.1.  Then,  substituting  white  circles  for  the  Sh  ar.d  black 
circles  for  the  V,  apply  rule  T3a  to  the  right  side  of  Fig.  4.1j  substituting 
white  circles  for  the  Sh  and  the  black  circles  for  1,  apply  rule  (T3d)  to 
the  left  side  of  Fig.  4.i;  with  I  and  V  in  place  of  the  two  white  circles 


18 


on  the  right-hand  side  if  rule  T3a  and  H  in  place  of  the  black  circle  on 
its  left,  apply  the  rule  to  get  .  With  H  in  place  of  the  white  circle, 
apply  rule  T2. 

As  in  the  case  of  the  pictorial  language,  the  rules  of  formation  for 
the  tree  language,  with  a  suitable  extension  of  the  idea  of  concatenation, 
cast  the  grammar  of  the  tree-language  in  the  context-free  class.  The 
questions  of  recognizing  well-formed  formulas  in  the  language  are  thus 
special  cases  of  context-free  languages  in  general.  It  is,  however,  of 
interest  to  examine  the  tests  for  well-formedness  in  more  detail,  for  more 
specialized  versions  of  the  tree -language.  We  should  also  like  to  look 
at  the  connections  between  these  specialised  versions  and  their  correspondents 
in  the  pictorial  language. 


A •  The  Tree-Language  Specialized  for  Expressing  Descriptions 

We  now  add  a  distinguished  element  to  the  non- terminal  vocabulary  of 
S<TLd>.  We  call  it  D.  We  now  introduce  rule  TD1:  D  .  Actually,  D 

X 

will  appear  as  ^  ,  and  when  rule  TD1  is  applied,  1^  is  erased  and  D  remains. 
This  marks  completion  of  the  verification  that  the  tree  is  a  member  cf  1^. 

We  will  also  call  such  trees  D-trees.  By  a  parsing  of  a  D-tree  we  will  mean 
a  diagram  showing  the  rules  used,  and  the  order  of  their  use.  If  we  complete 
Fig. 4.1  by  adjoining  D  to  the  upper  left  part  of  the  tree,  the  use  of  rules  to 
verify  that  Fig. 4.1  is  in  can  be  described  in  Fig.  4.3.  Rules  appearing 
on  the  same  line  in  this  tree  are  applied  simultaneously;  the  order  is 
immaterial;  rules  on  upper  levels  are  applied  after  rules  on  lower  levels. 

We  will  call  this  a  parsing  tree. 

Theorem  4.1:  To  each  element  of  corresponds  at  least  one  element 

.  V»JJ 

°f  Ltd- 
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1  •'«  1 


Proof;  He  will  show  that  the  rules  of  S(L^p)  and  S(l^)  are  equivalent 

under  appropriate  identifications.  Identify  with  0.  /  ^Sh 

Sh  •  x  /  \ 

wAtn  u  ««u  -  ^ju^wuii  ^  .  next;  idenciry  u  wicn|i  i|.  wext,  Identify 

T  *]  with  /\  ,  ^  and  ,  where  /\  is  such  that  only  D  or  Sh** 

*—  —  —  J  r-  --  ..  y  /  \ 

can  hang  on  its  left  bottom  branch.  Identify!  "IT  "Iwith  H  .  and 

r— i  ^  ^  u  /\  ^/\' 

also  rr--"i  with  V  and  V  and  fj  ,  /CN  ,  A-\  all  with  I  , 

c  !  /N  /\  L_l  ^3/  /\ 

where  ^is  again  such  that  only  ^s£  can  hang  on  its  left  bottom  branch. 


Rule  (1)  of  which  states  |  ]j  *►  T  ”!  thus  corresponds  to 

D  *  P~\  or  rule  TD1 .  Rule  (2a)  of  L  stated:  [  ']♦  T  \[  1  .  This 

^  I _ I  L_.JL_.J__ 

now  becomes:  (i)  I  H  ,  using  the  above  identifications  of  |"  ]  with 

'  ■  '  \L__J 

I  .  There  is  nothing  like  T  •*  H  because  we  allowed  only  Sh  to  hans 

/  \  /  \  /  \  _/  \  r__ ! 

in  the  left  bottom  branch  of  I  .  Rule  (2b)  of  L stated:  ,r  "I  L  . 


^  \l—  J 

nothing  like  I'  H  because  we  allowed  only  Sh  to  hang 
/\  ■/ \  _/\  r-_:, 

ttom  branch  of  I  .  Rule  (2b)  of  L_n  stated:  ,r  *1  -►  L _ !  . 


This  now  becomes: 

(ii)  I  .V  .  Rules  (2c)  -  (2e)  of  L__  all  become: 
/  \  /  \ 

(iii)  I  -  *1  . 

;  V  i - I  V 

We  now  use  the  identification  of  1  l  with  O  . 


I  t 

i - 1  ' 


latter  to  Sh  .  Rule  (3a)  of  Lrn  which 


1  \ 

We  now  use  the  identification  of  1  l  with  Q  ,  specializing  the 

*  ‘ - »  /  \ 

\  '  '  \  \ 

latter  tO^Sh  •  Rule  (3a)  of  L^0  which  is  ["’’j  |  ),  becomes  Sh  ^h 

This  is.  half  of  rule  T4a.  The  other  half,  'sh  •*■  Sh  is  obtained  by 

_ _ _  /  '  V  \ 

identifying  j  ,  with  .  In  a  similar  manner,  we  can  identify  rules 

(3b)  and  (3c)  of  I*GD  with  T4b  and  T4c. 

We  now  identify T  *]with  I  once  more  and  apply  rules  (3a),  (3b), 
L-->  /\  s 

(3c)  of  Lgd  again.  We  wish  to  show  that  Sh  follows.  But  (3a),  (3b), 

(3c)  of  I.GD  would  result  in  1^  -  "sh  p  +  /Sh'sh^  -  /S\Sh^  Ju8t  a8 

if  Sh  were  substituted  for  I  .  We  write  this^substitutabi lity  condition 

\ 

as  (iv)  ^  ^Sh.  How,  (i),  (ii),  (iii),  (iv),  can  all  be  put  together 
to  state  I  ■*  g  ,  which  is  rule  T2. 

/\  A 
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It  remains  only  to  verify  that  rules  T3(a-d)  follow  from  the  rules 


of  Lgp.  The  symbols 


^3  and  cf  are  each  identified  with,'  1,  since  they 

/\  A  r..r., 

are  special  cases  of  With  we  identify  bothri  =  iand!^  jT  1 


for  which  we  can  substitute  ^  and  ^respectively.  Hence^ +^and 


A  A  /\ 


^+^follow.  This  leaves  only  (T3b)  and  (T3d)  to  derive,  and  this 


follows  directly  from  the  definition  of  1  and  ,  namely,  that  only  Sh 


can  be  attached  to  the  left  bottom  branch.  Hence  all  the  rules  cf  S(LjD> 


follow  from  the  rules  of  S(Lg^).  No  rules  are  implied  by  S(Igp)  which  ar“ 


not  in  S(Ltd), 


A  given  picture  in  can  be  obtained  from  the  rules  of  S(I.fif))  in 


at  least  one  way.  For  example, 


OTT 
10 


GD 


can  be  obtained  by  applying  rules 


(3b), (3b), (3b), (2b), (2a)  and  (1)  or  by  applying  rules  (3b)(3b)(3b)(2a) 
(2b)  and  (1).  This  corresponds  to  the  two  D- trees:  Figs.  4.4  and  4.5. 


TD1 
I 

T2 
I 

T3a^ 

T3a 


T3c 

/  \  /\ 

T4a  T4a  T4a  T4a 


/>, 

.V  Sh  and 

/\ 

Sh  Sh 

✓  \  /\ 


/\ 

D  / 

/  \  ✓  \ 


/  \ 

H  Sh 


Sh  Sh 


Fig.  4.3 


Fig.  4.4 


Fig.  4.5 


This  proves  the  theorem. 

B.  The  Tree-Language  Specialized  for  Expressing  Constr uctlons 

The  distinguished  element  of  the  non-terminal  vocabulary  of  SfL*^) 
which  corresponds  to  the  unit  of  is  designated  by  C.  Otherwise,  the 
terminal  and  non-terminal  vocabulary  is  as  it  was  for  S(L^0)-  A11  rules 

are  unchanged  except  TD1  which  goes  to  TCI:  C  A  construct  ion-tree 

thus  looks  exactly  like  a  D-trec,  except  that  D  is  replaced  by  C. 
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We  must  add  one  more  rule,  however.  In  verifying  that  a  given  tree 
belonged  to  L^,  we  replaced  the  tree  on  the  right-hand  side  of  a  rule  by 
the  left-hand  side  of  the  rule,  and  completed  verification  when  the  process 
ended  with  symbol  D.  Now  we  will  not  replace  the  right-hand  side  of  a  rule 
]>y  the  left-hand  side,  but  simply  check  that  the  right-hand  tree  indicated 
on  the  side  of  the  rule  is  properly  attached  to  the  tree  on  the  left-hand 
side  of  the  rule,  we  call  this  rule  TC5. 


Theorem  4.2:  The  construction  language  LGC  and  are  in  1-t  correspondence. 

Proof:  Recall  that  an  element  of  is  a  picture  with  every  configuration 
enclosed  in  a  rectangle  of  dotted  lines,  and  a  frame  of  wavy  lines.  It  is 
easy  to  see  that  TCI  corresponds  exactly  to  rule  (1)  of  S(L  )  and  TC5 

UV 

exactly  to  rule  (4)  of  StL^).  The  other  rules  of  S(LGG)  are  the  same  as 
those  for  S(J,GQ)  and  those  of  S(I^)  and  S(Lqq)  have  been  established  In  the 
preceding  theorem. 

It  remains  to  show  that  to  each  tree  in  corresponds  a  unique 
•  # 

element  of  LGC'  The  tree  in  1^G  specifies  a  particular  sequence  of  rules 
to  be  applied  in  a  specified  order,  except  for  rules  at  the  same  level. 

These  rules  of  correspond  uniquely  to  rules  of  S(LGG),  to  be  applied 

in  the  same  order.  This  generates  exactly  one  "parsed"  diagram  or  element 


Pig.  4.6 


Fig.  4.7  Fig.  4.8 


TC5 

I 

TCI 

I 

T2 

I 
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Fig.  4.9 


For  example,  the  tree  in  Fig.  4.6  results  in  the  construction  specifi¬ 
cation  shown  in  Fig. 4. 7.  The  parsing  trees  of  rules  are  shown  in  Figures  4.8 
and  /■  .9. 

Theorem  4.3:  To  any  construction  specification  tree  in  corresponds  a 
unique  picture  in  L^. 

Proof;  This  follows  from  theorem  4.2  and  theorem  3.2. 

We  first  construct  a  pictorial  construction  statement  and  then 
proceed  to  construct  the  picture  with  rules  C1-C3c  of  Section  III. 


C.  The  Tree-language  Specialized  by  Pejicg  Queries 

In  parallel  to  the  study  of  previous  subsystems  we  add  to  VM  the 

N 

distinguished  element  designed  by  Q,  and  to  Vj  the  element  ?.  The  rules  cf 
s(^Tq)  are  the  same  as  those  ol  S(LjC>,  except  that  rule  TCI  is  replaced  by: 
Rule  TQ1:  Q  -*■  I.  A  typical  query- tree  is  shown  in  Fig.  4.10. 


Sh  H 

✓  \  /  \ 

Sh  7 


A 


Sh 


\ 

Sh 

/\ 


We  need  only  identify  Q  with 
theorem: 


Fig.  4.10. 

1  to  be  able  to  state  the  following 

i _ i 
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Fig.  4.11 


We  now  wish  to  extend  in  order  to  take  advantage  of  the  fact  that 
a  tree- embodies  many  implications  due  to  the  ordering  of  certain  relationships 
We  will  introduce  rules  that  allow  us  to  form  a  query  in  which  we  replace  a 
path  in  a  D-tree  which  has  as  vertex  by  the  path  H'  connecting  the  same 

/  \  yi  j  i 

end-points.  Similarly  we  introduce  a  corresponding  rule  for  j  \  and  j  ^ 

Call  these  rules  TQ6a,  TQ6b  and  TQ6c.  Thus  we  can  replace 


/\ 

q  ;b'\ 


l/V\h%  >y  A  Ah  or  by 


*  r(  X 

•  Sh 


Fig.  4.13 


/\ 

Q  TH’ 

Nh 

/\  /\ 

A  A 

Fig.  4.14 


Fig.  4-12 
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or  by 


or  by 


Q 


/\ 

?.V' 

/  \ 


Sh 

/  \ 


Sh 


Fig.  4.15. 


Fig.  4.16. 


In  words,  these  would  ask:  Is  there  a  circle  to  the  left  rf  a  square?  Is 
there  a  triangle  to  the  left  of  a  square?  Is  there  a  triangle  aKve  a 
circle?  Is  there  a  triangle  inside  a  circle?  If  a  Q-graph  has  question- 
marks  along  side  a  node  name,  the  relationship  indicated  is  to  be  verified 
rather  than  filled  in. 

We  shall  call  the  augmented  tree-query- language 
Unlike  the  answering-procedure  used  within  a  graphic  language,  we 
do  not  search  the  corpus  cf  parsing  trees  but  the  corpus  cf  D-trers  them¬ 
selves.  A  search  procedure  somewhat  analogous  to  Q1-Q8  in  Section  III  follows. 

TQS1:  Apply  rules  TQba,  TQyb,  TQ6c  where,  applicable  to  all  D-trees  in  the 
corpus . 

TQS2:  Compare  the  transformed  query  tree  with  each  D-trc.2  in  the  giv-.-n 

corpus  ignoring:  1)  a  failure  tc  match  between  nodes  where  therr  w-j> 
a  ?  in  the  Q-trec  2)  a  failure  to  mitch  Q  and  D.  Thus,  where  Q-rr*e 
cf  Fig.  4.10  mitche*  the  D-tree  of  Fig.  4.12.  Rule  TQS1  did  n  t 
apply. 

If  the  Q-troc  were  th.it  cf  Fig.  4- 13  instead  of  Fig.  4.10,  rule  IQS! 
would  have  been  applied,  and  Fig.  4.12  would  have:  been  tran?f'nr-d 
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into  Fig.  A. 17,  among  the  many  transformations  that  would 
have  been  possible.  This  transformation  results  in  a  match. 

, 

D  /H\ 

Sh  Sh^ 


a 


Fig.  4.17 


TQS3:  Apply  the  rule  C  -*■  Q,  i.e.,  replace  Q  by  C  in  all  Q-tree.  Also 

replace  each  question  mark  in  the  Q-tree  by  either:  1)  the  subtree 

of  the  matching  D-tree  which  makes  the  match  complete  (except  for 

D  and  C);  this  subtree  will  begin  with  and  it  will  be  markc  .1 

cg& 

/\  2)  the  symbol  V  (to  indicate  "Yes")  is  the  qucsLion-raark  was 

next  tc  a  node  as  in  Fig.  4.13,  and  the  node  label  was  the  same  in 
the  Q-tree  as  in  the  matching  D-trec;  3)  the  symbol  N,  (to  indicate 
"No")  if  in  the  above  case,  the  node  label  next  to  ?  in  the  Q-tree 
is  different  from  that  in  the  matching  D-tree.  4)  The  symbol  M  if 
there  is  no  matching  D-tree;  also  replace  the  vertex  labei  by  the 
one  in  the  matching  but  non-verifying  D-tree. 

TQS4:  •  Applying  the  above  rule  results  in  a  C-trec,  specifying  a  construction. 
Execute  the  indicated  constructions.  This  results  in  an  answer-trte. 

D.  The  Tree-Language  Specialized  for  Stating  Answers 

Two  answer-trees  are  illustrated  in  Fig.  4.18.  The  procedure  for 
constructing  an  answer-tree  from  a  C-tree  is  to  simply  replace  C  by  A  and 
to  copy  tlie  rest  of  the  C-trce. 

By  identifying  Sh*  ,  ,  S$h ^  sh  with  O  £ ,  3D 

AUj*-C  /  «T\  JX  -tvrrx 

/\  ">  respectively  we  can  obtain  a  1-1  correspondence  between  answer- 1 rees 
Try'  V 

of  and  the  pictorial  answers  of  L^p  we  understand  the  set  of  all  possible 
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trees  of  the  type  illustrated  in  Pig.  4.18b.  There  is  no  pictorial  answer 
corresponding  to  Fig.  4.18a.  The  sublanguage  introduced  trees  like  that 

of  Fig.  4.18a  and  4.18b.  In  the  sense  that  and  contain  trees 

not  in  hjq  and  respectively,  these  are  mere  powerful  languages. 

He  could  try  to  extend  the  graphic  languages  L^,  so  that  they 
correspond  more  closely  to  1.'^  and  as  illustrated  in  Figs.  4.19  and 

4.20.  The  pictorial  query  corresponding  to  Fig.  4.13  would  be: 


Fig.  4.19 

The  question-mark  appearing  in  the  outer  box  of  dotted  line  indicates  that 

I 

verification  of  the  relationship  indicated  by  that  box  is  in  qucsti.n. 
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The  pictorial  answer  corresponding  to  Eig  4.21  is  shown  in  Fig.  4.20.  The 
curly  frame  around  the  vertical  configuration  consisting  of  two  circles 


indicates  the  answer  to  the  questioned  relation.  If  in  Fig.  4.21,  there 
were  NV  (to  indicate  "no,  it  is  vertical")  in  place  of  YV  (to  indicate  "yes, 
it  is  vertical"),  then  Fig.  4.20  could  still  be  an  answer-tree,  but  to  a 
query  which  had,  say,  T  H  instead  of  2  V  at  the  corresponding  node.  But 
there  is  no  way  of  representing  the  relations  li\  V*  and  I*  in  the  pictorial 
languages.  From  this  point  of  view,  the  tree- languages  have  greater  power 
of  representing  queries  and  answers  than  do  the  graphic  languages. 

Theorem  4.3:  There  exist  query-tree.;  and  answer-trees  in  a  tree  language 

for  which  there  are  no  corresponding  representations  in  the  pictorial 
languages,  and  . 

Step  TSQ2  of  the  preceding  section  is  critical  for  taking  advantage 
of  a  tree- language ■  In  the  first  place,  it  is  veil  to  note  that  in  query- 
answering  even  in  the  pictorial  languages,  we  compared  trees  --  the  parsing 
trees  made  up  of  rules  corresponding  to  pictures  of  L^. 

In  step  TSQ2,  ve  ignore  the  Q  and  D  as  well  as  question-marks  in 
comparing  a  given  Q-tree  with  a  corpus  of  D-tree?.  Hence  we  will  strip 


2C 


the  trees  of  the  .^rpus  and  of  the  query  of  its  question-marks  and  of 

q/\^and  at  the  top.  If  the  query  has  no  h\  V*  or  1^,  we  search 

the  corpus  for  a  stripped  D-tree  matching  the  stripped  Q-tree.  If  the 

query  has  H^,  v\  or  l\  we  extend  the  corpus  by  applying  rules  TQ6a,  TQ6b, 

TQ6c,  and  then  search  the  extended  corpus.  In  extending  the  corpus,  we 

apply  only  the  one  of  the  tree  rules  indicated  by  the  Q-tree;  we  ppply  the 

Sh 

rule  by  identifying  first  the  pair  of  y  ^-trees  indicated  in  the  Q-tree. 

We  then  trace  up  from  those  terminal  nodes  of  a  candidate  tree  in  the  corpus, 
simultaneously  from  both  terminal  points,  and  check  whether  the  vertex  of 
the  path  is  as  prescribed  in  the  Q-tree. 

In  comparing  a  stripped  D-tree  of  the  extended  corpus  with  the 
stripped  Q-tree,  we  also  begin  simultaneously  at  all  terminal  points  and 
trace  up  the  paths.  We  call  a  mismatch  as  soon  as  one  of  the  vertices 
other  than  one  which  had  a  question-mark  next  to  it,  and  proceed  to  another 
D-tree  of  the  corpus.  We  do  not  examine  trees  K>  which  rules  TQ6  were 
applied,  only  the  trees  that  result. 

The  D-trees  and  the  trees  corresponding  to  -parsed  pictures  are  in 
1-1  correspondence.  We  have  seen  that  many  D-trees  correspond  to  a  ftiven 
picture.  We  will  call  these  equivalent.  If  a  given  picture  corresponds 
to  the  answer  to  a  query,  there  will  again  be  many  queries  which  correspond 
with  the  same  answer.  Wc  will  ca1!  these  queries  equivalent.  We  n-'w  seek 
a  representative,  a  canonical  member,  of  each  of  these  two  equivalence 
classes  so  tliat  only  one  comparison  between  the  canonical  query  and  a 
canonical  D-trec  is  needed.  This  will  minimize  the  number  of  D-trees  in 
the  corpus  that  ha^e  to  be  compared. 
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V.  OPTIMAL  TREES  FOR  STORAGE  AND  SEARCH 


The  criterion  for  choosing  the  canonical  representative  of  an  equiva¬ 
lence  class  of  trees  in  minimization  of  the  expected  number  of  elementary 
node-de? utions  necessary  to  transform  a  D-tree  into  the  tree  is  specified  by 
H',  V',  or  I'  in  a  Q-tree.  To  define  an  elementary  node-deletion,  consider 
an  algorithm  for  applying  rules  TQ6.  Suppose  that  the  given  Q-tree  is 


q'\h- 

\ 

1 

Sh  and 

/  ' 


where  T^  and  T^  are  two  terminal  nodes  corresponding  to,  say, 
SH 

Sh  .  Suppose  that  tue  following  is  a  path  in  a  D-tree 

/  \ 


with  the  same  vertex  and  terminal  nodes.  To  transform  this  path  into 
H* 

^  \  ,  we  start  with  T„  or  T„,  whichever  is  lower  in  the  D-tree.  We  trace 

1  2  1 

to  the  node  on  the  next  level  up  and  delete  it,  proceeding  in  this  way  until 
we  are  at  the  level  of  T^  or  T^  whichever  was  higher  in  the  D-tree.  We  now 
'move  up  to  the  next-level  node  on  both  sides  of  the  path,  and  check  whether 
the  paths  intersect.  If  not,  we  delete  both  nodes,  move  up  to  the  next 
level  and  repeat.  If  so,  we  check  that  this  top-most  vertex  is  H  (as  pre¬ 
scribed  in  the  Q-tree)  and  terminate  the  process.  (If  it  is  not  H  we  substi¬ 
tute  what  it  is  and  place  N  next  to  it  information,  the  answer-tret,  as 
indicated  before  •) 

To  make  clear  what  we  mear  by  the  level  of  a  node  in  a  tree,  we 
call  the  top-most  node  in  the  tree  level  0.  Thus,  in  Fig.  5.1,  at  level 
0  we  have  H,  at  level  ?,  V  and  H,  etc.  Both  T^  and  happen  to. be  at 
level  8  in  that  example. 

Theorem  5 . 1 :  Consider  a  query  tree  of  the  form  Q  ,  where  X1 

1  2 

stands  for  H*,  V*  or  I'  and  I'f,  T.  for  Sh  ,  Sh  Or  Sh 

'  2  '  ;\ 


30 


/H\ 

v\  /H\ 

H  .v. 

^  \ 

/  /  v 


m  r 


\ 


H 

/  \ 


/  X 


Fig.  5.1 


Let  p(u,v,w)  be  the  probability  of  being  at  level  u,  T2  being 

at  level  v  and  the  vertex  at  which  the  path  up  from  intersects 

the  path  from  being  at  level  w.  The  expected  number  of  elementary 

X* 

node-deletions  to  transform  a  circuit  in  a  D-tree  into  /  y  is 

t'  t 

1  2 

d  r  d  1 

Z/r—i  r— i  r— i  f— i  \  l—i  l—i  r— >  J 

iZj  'L,  +2j 

x- 1  L'  u  v=u+1  v  lf-v-M  '  w  v  w  J 


Proof:  Let  L2;  Lj  denote  the  levels  of  T2  and  and  suppose  . 

It  will' take  L^  -L^  -1  deletions  to  get  to  the  same  level.  It  will  take 
another  L^-L-1  deletions  to  reach  the  top  vertex  of  the  path  on  the  shorter 
arm,  I«2*L  the  longer  arm.  Altogether,  this  is  2L2-2L-2  deletions.  If 
L2  <  Lj,  it  will  take  2L^-2L-2  deletions.  The  probability  of  L2>L-1  being 
x  is  a  convolution, 

d  d 

r— i  r— i  t— 

^  P(L2-v,  1=v-x-1)P(L2  >  L,)-^  ^  P(L2=V,I>-V-X-1)P(L2-V,  Lj-u). 

L2,  Lj  u=0  v=u+1 

Here  d  is  the  lowest  level  of  the  D-tree,  called  its  depth.  Similarly  the 
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if  ft  f "■  vA  _  A;  -  ■ 


A  A 


probability  of  L^-L-l  being  x  is  ^  ^  P(I»j“u,  l*u-x"l)P(I<j“u,  ■  v). 


v«=1  ufv+1 


i  *  i  «  »  •  i 

We  can  define  q(u,v)  s  )  p(u,v,w)  “  P(Lj«=u,  *  v).  The  total  probability 

w 

of  having  x  deletions  is  now 

d  d 

r— i  r— »  r-^  r*-i  t— i 

)  2  p(u,v,v-x-1)q(u,v)  +2  )  P(u>v,v-x-1)q(u,v)  +  )  P<v,v,v-x-1)q(v,v). 

Al'vn&t  vfe+l  v 

The  expected  number  of  deletions  is  the  sum  of  this  expression  over  x  from 
1  to  d. 

QED. 

If  p(u,v,w)  is  a  uniform  distribution,  the  expected  number  of  node 

deletions  will  be  proportional  to  d,  the  depth  of  D-trees.  For  distributions 

with  a  smaller  variance  than  the  uniform,  the  expected  number  may  grow  less 

slowly  than  d  but  never  faster.  Hence,  minimizing  d  will  minimize  the  upper 

bound  on  the  expected  nunber  of  deletions.  Thus,  if  there  is  a  number  of 

equivalent  D-trees  that  correspond  to  the  same  picture,  use  as  a  canonical 

representative  of  this  equivalence  class  of  D-trees  the  one  with  smallest  d. 

•  * 

To  illustrate,  the  two  D-trees  shown  in  Figures  5.2  and  5.3  are 
equivalent  representations  of  the  picture  shown  in  Fig.  5.4. 
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For  the  tree  In  5.2,  d=4,  for  the  one  in  5.3,  d*8.  Generally,  d  cannot  be 

leas  than  log2n,  where  n  is  the  number  of  objects  or  terminal  nodes. 

One  of  the  shortcomings  of  a  tree-query- language  like  l*  is  that 

H’  ** 

it  does  not  allow  us  to  ask  a  question  like  '  ,  where  J.  and  8  denote 

the  first  and  8t^1  circle  in  Fig.  5.4.  To  ask  such  queries  we  must  introduce 
naming  which  leads  into  the  topic  of  name- languages  to  be  treated  in  Section 


6.  Note  that  if  we  could  ask  this  query,  transforming  the  tree  Fig.  5.2 

H 

into  j/  would  require  4  node-deletions j  transforming  tree  (5.3)  into 
it  would  require  6  node-deletions.  Thus,  5.2  should  be  sed  as  the  standard 
representative  of  the  picture  of  Fig.  5.4.  A  corpus  of  pictures  to  be 
searched  for  answers  to  queries  will  henceforth  be  stored  in  terms  of  such 
corresponding  standard  D- trees.  Some  of  the  details  of  how  to  store  such 
trees  in  a  computer  memory,  how  to  index  a  corpus  of  such  stored  trees,  the 
programs  for  search  —  all  aimed  at  efficiency  — are  given  in  Sclent.  Rpt.  No.  3. 

Trees  and  pictures  are  not  at  the  same  level.  If  a  tree  is  used  to 
describe  fthe  same  data  described  by  a  picture,  the  tree  corresponds  to  the 
parsed  picture.  A  parsed  tree  corresponds  to  a  parsing  of  a  parsed  picture. 

We  can  think  of  a  tree- representation  cf  data  as  further  removed  from  an 


iconic  representation  of  the  data  than  is  the  pictorial  representation.  The 
English- like,  symbolic  language  to  be  studied  next  is  even  further  removed 
than  the  tree- language.  The  loss  of  iconic  resemblance  between  the  repre¬ 
sentational  symbols  and  their  designate  is  compensated  by  increased  power 
of  generalization,  expressibility,  and  inference. 
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VI.  ENGLISH- LIKE  LANGUAGES 


A.  An  Enalish-Like  Language  for  Describing  "Pictorial*1  Data 

A  description  of  Fig.  6.1  in  English  words  might  read  as  follows: 
HFig.  6.1  is  a  picture  which  consists  of  a  square,  two  circles,  and  a 

triangle  in  vertical  alignment.  The  square  is  at  the 
top,  the  triangle  at  the  bottom."  The  basic  complete 
unit  in  this  language,  corresponding  to  a  picture  or 


Fig.  6.1 


a  D-tree,  is  a  paragraph,  such  as  the  above.  Corre- 

r5“1 

spondlng  to  "configuration,"  such  as 

A 

tree  suqh  as  is  the  sentence. 


!  6  |  *  or  4 


)v  ls 


This  is  a 

unit  of  the  non-terminal  vocabulary.  Spaces  and  selected  English  vords 
constitute  the  terminal  vocabulary. 

In  what  follows,  we  .will  confine  our  attention  to  paragraphs  in 
standard  but  English-like  form.  Elsewhere  we  shall  provide  rules  for  trans¬ 
forming  less  constrained  paragraphs,  such  as  the  one  illustrated  above,  into 

these  standard  forms.  The  standard 


/  \ 

°  /  \ 

/\  7\ 

Sh  sh  Sh  Sh 


_  form  paragraph  will  consist  of  a 

single  sentence.  We  will  denote  this 
form  by  DPGSENT.  (MNEMONIC:  Descriptive 
paragraph  sentence).  It  is  the  unit  U 
Fig.  6.2  of  the  linguistic  system  it 

the  distinguished  element  of  the  non¬ 
terminal  vocabulary.  Important  other  members  of  the  non-terminal  vocabulary 
are  PICT,  SPEC,  NREL,  SH,  etc. 


The  description  of  Fig.  6.1  in  tree  language  ls  shown  in  Fig.  6.2 
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All  elements  of  the  non- terminal  vocabulary  will  here  be  written 
in  capitals;  those  of  the  terminal  vocabulary  in  lower-case  letters,  except 
for  the  proper  names  of  individual  objects  which  begin  with  a  capital 
letter  and  are  underlined. 

The  rules  of  8(1^)  *re: 

ND1:  DPGSENT  -*■  NI  +  PICT 

The  right-hand  side  of  this  rule  specifies  a  concatenation  of  two 

units,  with  a  space  between  them,  unit  NI  being  to  the  left  of  unit  PICT. 

These  units,  in  turn,  are  specified  by  similar  rules.  Such  rules  are  applied 

repeatedly  until  a  string  of  words  in  the  terminal  vocabulary  results. 

ND15:  NI  •»  One,  Two.  Three  ...  (Mnemonic  for  NI:  "Name"  of 

Individual) 

"*J.>  2  i  2  j  •  •  • 

The  coronas  in  this  and  similar  rules  denote  "or".  The  terms  on  the 
right-hand  side  are  elements  of  the  terminal  vocabulary;  any  one  of  them 
could  be  substituted  for  NI  in  ND13,  and  again  in  ND1.  The  three  dots  in 
ND15  are  to  suggest  that  any  word  "like"  the  first  three  —  i.e.,  any  English 
word  that  begins  with  a  capital  letter  and  is  underlined  —  can  als'  be 
substituted  for  NI. 


ND2: 

PICT  -  IS  +  SPEC 

ND8: 

CLS  -*•  W  +  CL 

ND3: 

SPEC  ♦  A  +  SP 

ND9: 

CL  -»  IS  +  PROP 

ND4: 

SP  -♦  PCT  +  CSOF 

ND10: 

PROP  -  SH  -f  REL 

ND5: 

CS  OF  -*■  W  +  COF 

ND11: 

REL  -*  AND  +  RELN 

ND6 

a)  COF  -♦  CONS  +  OBJS, 

ND12: 

a) 

RELN  -  NREL  +  CONI 

b)  -*■  AND  +  OBJS 

b) 

RELN  ■>  the  last  object. 

ND7:  OBJS  -  NI  +  CLS 
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ND13: 

•  ) 

CONT 

•>  NI  +  REL 

ND22: 

AND  and 

b) 

♦  NI  +  SC 

ND23:  a) 

NUEL  below, 

ND14: 

SC 

♦  i  +  COF 

b) 

to  the  right  of 

ND16: 

IS 

is 

c) 

to  the  left  of 

ND17: 

A 

-*■  a,  an 

d) 

enclosing 

ND18: 

PCT 

■*  picture 

e) 

above 

HD19: 

W 

■*  which 

f) 

■*  inside 

ND20: 

CONS 

consists  of: 

HD21: 

•  ) 

SH 

•+  circular, 

cr 

■+  square. 

c) 

-*•  triangular 

In  applying  a  rule  like  23b,  the  phrase  "to  the  right  of"  is  treated 
like  a  single  word,  as  if  the  3  spaces  were  not  present.  To  describe  Fig. 

6  1,  we  have: 

"Figure  6.1  is  a  picture  which  consists  of:  One  which  is  square 
and  above  "ivo;  and  TWo  which  is  circular  and  above  Three:  and 
Three  which  is  circular  and  above  Four:  and  Four  which  is  tri¬ 
angular  and  below  Three  and  the  last  object." 

There  are  41  words  in  this  sentence  (counting  "consists  of"  and  "the  last 
object"  as  single  words  and  the  semicolons  as  words)  and  these  are  identified 
as  follows: 

Sentence  words:  Figure  6.1  is  a  pictmc  which  consists  of 

Corresponding  non-  NI  IS  A  PCI  W  CONS  etc. 
terminal  symbols: 

Coriesponding  Rule:  ND15  ND16  ND17  ND18  ND19  ND20 
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The  language  defined  by  these  vocabularies  and  rules  will  lead  to 


L 


some  "sentences"  which  are  unwanted;  for  example,  nothing  makes  us  discard 
a  sentence  having  in  it  the  clause  "above  TWo  and  below  though  this 

is  evidently  contradictory.  It  will  not  generate  very  ungrammatical  sentences. 
Many  grammatically  better  and  equivalent  statements  are  missed.  He  do, 
however,  claim: 

Theorem  6.1:  The  name- language  is  equivalent  to  L^. 

Proof:  In  identifying  items  in  the  terminal  vocabulary  of  S(LGQ)  with 
items  on  the  terminal  vocabulary  of  we  must  identify  two  levels  of 

names:,  proper  names  of  individual  object  tokens  and  generic  names  of  object- 
types.  (This  i3  what  we  did  not  try  to  do  in  non-name  languages.)  Identify 
1  d'1  with  a  proper  name  like  Figure  6. 1  and  with  the  word  "picture"  in  the 
form  "Figure  6.1  is  a  picture  which  consists  of:"  Identify  O  with  a  proper 
name  as  well  as  with  "circular";  etc. 

Given  any  picture  in  L^,  we  always  begin  by  forming  the  phrase 
"Name  is  a  picture  which  consist.':  of:",  according  to  rules  ND15,  16,  17, 

18,  19,  20.  Here  Name  stands  for  an  arbitrary  name  we  assigned  to  the 
Pi  cture.  He  now  assign  different  names  to  all  the  objects  in  the  picture. 

He  then  form  a  clause  for  each  object,  which  always  starts  with  "Name  which 
is  (Shape)  and  •••."  In  place  of  shape  we  insert  circular,  triangular  or 
square.  We  must  form  as  many  such  clauses  as  we  have  different  names.  We 
complete  the  clause  —  fill  in  •**  —  by  "to  the  right  of  Name  and  to  the 
left  of  Name  and  above  Name  and  below  Name  and  inside  Name  and  (the  last 
object.)"  however  many  of  these  apply.  We  excluded  mention  of  "enclosing 
Name",  for  that  information  is  picked  up  when  the  latter  name  appears  at 


f 

I 


aBfc» 
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che  head  of  a  clause.  Otherwise,  however,  we  do  not  try  to  eliminate 
redundant  information.  This  procedure  will:  a)  form  a  well-formed  sentence 
according  to  rules  ND1  -  ND23;  b)  describe  the  given  picture. 

We  can  construct  an  equivalent  picture  from  any  sentence  in  ■no  by 
making  the  appropriate  identifications. 

To  say  that  LGD  and  Ljjq  are  equivalent  is  to  say  that  we  can  trans¬ 
form  each  "sentence"  of  the  other  language,  not  that  there  is  a  1-1  corre¬ 
spondence  between  the  two  languages.  Indeed,  to  a  given  picture  correspond 
many  sentences  of  and  they  are  informationally  equivalent  to  each  other. 

Conversely,  to  each  sentence  of  correspond  a  niraber  of  equivalent 
pictures  which  differ  in  metric  and  other  respects  we  have  here  ignored. 

By  the  parsing  tree  associated  with  a  sentence  of  L^,  we  mean  the 
tree  obtained  as  a  result  of  applying  various  rules  to  the  sentence.  The 
tree  has  at  its  nodes  the  vocabulary  items  and  rule  names.  As  an  example 
consider  the  parsing  tree  for  the  sentence  stated  at  the  beginning  cf  this 
section  —  it  is  shown  in  Fig.  6.3. 


B.  The  Fnglish-Ljke  language  for  Specifying  Constructions 

We  wish  to  construct  a  language  cf  imperative  sentences  such  that 
executing  the  directives  results  in  sentences  cf  1^.  We  shall  uae  the 
word  "order"  to  designate  the  units  of  Sic  In  constructing  SfL^)  it  is 
helpful  to  think  of  the  pictures  associated  with  the  sentence  of  1^  which 
is  generated  on  execution  of  an  order.  In  this  picture  each  object,  each 
configuration  and  the  picture  itself  is  assumed  to  be  mmed.  A  typical 
order  is:  "Kamc  1  is  an  order  specifying:  rettf  igurat  1  :-n  1  enclosing 
configura  t  ion  2  ahuve  ccnflgurat  ion  3;  conf  igura  t  i  c  n  2  enclosing  Nam.,  2 
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which  i*  square  and  enclosing  configuration  4;  configuration  3  enclosing 
configuration  5  to  the  left  of  configuration  6;  configuration  4  enclosing 
Hiat  3  which  is  circulir;  configuration  5  enclosing  Name  4  which  is 
circular  and  encloses  configuration  7;  configuration  6  enclosing  Name  5 
which  is  triangular;  configuration  7  enclosing  Name  6  which  is  circular." 

We  will  specify  rules  of  construction,  like  C1-C6c  in  Section  m 
which  take  the  above  statement  into  a  pictorial  construction  statement  in 
LCC'  We  “ust  first  develop  rules  for  forming  statements  like  the  above. 


These 

are: 

MCI 

CPGSENT 

NI.  +  ORDER 

NCI  3 

CONF  -*■  configuration 

NC2 

ORDER 

•*  IS  +  A  +  ORDSP 

NC14 

NIK  -  1,  2,  3, 

NC3 

ORDSP 

-*  ORD  +  SPING  +  SPEC 

NCI  5 

ENC  ■*  enclosing 

NC4 

SPEC 

-  VAR  +  ENCL 

NC16 

SPING  •*  specifying 

NC5 

VAR 

-*  CONF  +  NUM 

NC17 

IS  -  is 

NC6 

ENCL 

-  ENC  +  PR 

NCI  8 

A  •*  a,  an 

NC7 

a) 

PR 

-  VAR  +  NREL  +  VAR  +  SC 

NCI  9 

W  -*  whioh 

b) 

Nl  +  Cl. 

NC20 

PER  -*•  • 

NC8 

CL 

-  W  +  IS  +  REST 

NC2 1 

SMI COL  >  ; 

NC9 

a) 

RES1 

-  SH  +  SC 

Rules 

for  Sh  and  NREL  ae 

b) 

-  SH  +  Ar.)  +  EN 

the  same  as  rules  ND2 1 ,  1.D23; 

c) 

-  SH  +  PER 

NI  as 

in  ND15 . 

NCI  0 

EN 

-  ENC  -f  VAR  +  SC 

NCI  1 

SC 

-  SMI  COL  +  SPEC 

NCI  2 

ORD 

•*  order 
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There  is  a  last  important  rule  which  cannot  be  expressed  in  the  form 
of  a  production.  It  is  NC22:  The  numerals  following  every  occurrence  of 
the  word  "configuration"  must  be  assigned  so  that  each  string  of  terminal 
element  between  semicolons  begins  with  "configuration  i..."  in  the  order 
i  •  1,  2,  3,  . ..,  n.  No  mineral  larger  thin  n  can  appear  r.nywhere.  Each 
mineral  except  1  must  occur  exactly  twice. 

Given  a  statement  like  the  one  illustrated  above,  wt  parse  it  first: 
that  is,  we  form  a  labelled  bracketing  or,  equivalently,  a  parsing  tree, 
which  is  shown  in  Fig.  6.4. 

The  figure,  for  brevity,  covered  only  the  first  two  clauses  of 
the  specification  in  the  order.  At  each  node  of  this  tree  should  also  be 
the  name  of  rules  used  to  obtain  the  non-terminal  e’ement  at  that  node. 

Thus,  the  top  node  is  the  left-land  side  of  rule  NCI. 

To  construct  a  pictorial  order,  proceed  as  fallows,  using  the 
parsing  tree  illustrated  in  Fig.  6.4. 

CN1  Examine  the  top  node.  If  it  is  CPC-SF.NT  (NCI),  drew  a  wavy  line 
frame . 

CN2  Scan  the  tree  from  the  top  down  to  the  first  occur:-.nce  ot  SPEC(NC4)  . 
When  located,  draw  a  dotted  line  square  inside  the  wavy  line  frame 
To  determine  the  order  in  which  this  c:  nstructi  'n  pr-'cecds,  trace 
from  SPEC  to  the  nearest  NUM;  it  should,  in  this  step,  be  1. 

CN3  Trace  down  the  tree  from  SPEC  to  FNTl(NCS).  Trace  one  step  down  t 

r - < 

locate  PR.  If  Rule  (NC 7a)  applies,  draw  r=-'-^<r  '<  inside  the 

L---J  L;iJ 

dotted  line  square  just  completed  depending  on  whether  rule  (ND23e) 
or  (ND23c)  applies. 


Fig.  6.4 


CN4  If  Rule  (NC7b)  applies  at  PR,  trace  down  to  REST.  If  Rule  NC9a 
applies  at  REST,  draw  O  ,  □  ,  or  A  ,  inside  the  dotted-line 
square  Just  drawn,  depending  on  whether  (ND21a),  ND21b)  or  (HD21c) 
appli  es  at  Sh. 

CN5  If  Rule  (NC9b)  applies  at  REST,  draw  ([]) ,  [fj]  ,  inside  the 

square  just  drawn  depending  on  whether  (ND21a,  b,  or  c)  applies  at 
Sh. 

CN6  If  Rule  (NC9c)  applies  at  REST,  the  construction  is  completed. 


Theorem  6.2:  To  each  construction  order  in  corresponds  a  unique 
pictorial  construction  specification  in  L__. 

Proof;  Every  element  of  has  a  parsing  tree  with  nodes  labeled  CPGSENT, 
SPEC,  ENCL,  REST,  SH  by  rules  of  S(l^,).  Hence  the  algorithm  CN1-CN6  applies 
to  each  statement  of  There  is  only  one  way  of  tracing  down  the  parsing 

tree,  so  that  the  nodes  specified  in  CN1-CN6  are  reached  in  a  unique  order 
if  we  parse  the  sentence  of  L^c  in  a  particular  way  (e.g.,  from  left  to 
right) . 


We  must  show  that  the  result  of  applying  steps  CN1-CN6  is  always 


an  element  of  L  To  show  this  note  that  steps  CN1  and  CN2  together 


GC 

,  which  is  the  results  of  applying  both  rules  (l1)  and  (4) 


produce  <  j_  J  j 

of  S(L_„).  Rule  CN3  correspond  to  rules  (2a),  (2b)  plus  rule  (4)  of  S(L_r). 


Rule  CN4  corresponds  to  rules  (3a), (3b), (3c)  plus  rule  4.  Rule  CN5  corre¬ 
sponds  to  rules  (2c), (2d), (2e)  plus  rule  (4).  Rule  6  insures  that  the 
conversion  process  from  to  terminates.  Figure  6.5  illustrates  the 
pictorial  construction  specified  by  the  statement  examplified  here.  The 
numbers  attached  to  the  dotted-line  squares  indicate  the  order  in  which 
they  were  drawn.  Names  can  be  omitted. 
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Fig.  6.5 


Theorem  6.3:  To  each  order  in  1^  corresponds  a  descriptive  statement 
in  Ljgj* 

Proof:  Construct  a  pictorial  construction  specification,  the  possibility 
of  which  is  guaranteed  by  theorem  6.2.  Then  execute  it,  according  to 
rules  Cl-C6e,  to  form  a  picture  in  LGD.  Then  proceed  to  describe  that  as 
outlined  in  the  proof  of  theorem  6.1.  The  result  is  a  statement  in 
as  proved  in  theorem  6.1. 

Theorem  6.4:  To  each  order  in  corresponds  a  construction  tree  in  L,^,. 
Proof:  Theorem  6.2  in  Section  VI,  asserts  that  and  T^  are  in  one-one 
correspondence.  Hence,  by  the  above  theorem  6.2,  the  result  follows.  To 
illustrate,  Fig.  6.6  shows  the  tree  corresponding  to  Fig.  6.5.  From  this 
we  can  Immediately  get  the  corresponding  D-tree. 


Fig.  6.6 
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me  Lansuaee  S 


and  Stating  Answers 


The  ’'simplest"  queries  involve  verification  of  a  specified 
statement.  This  is  similar  to  a  query  about  the  truth  or  falsity  of  a 
proposition.  Thus,  we  can  obtain  query  sentences  from  orders  simply  by 
replacing  the  beginning  of  an  order,  "Name  1  is  an  order  specifying:  ..." 
by  "is  it  true  that  Name  1  consists  of:  The  remainder  of  the  order 

is  unchanged. 

Actual  queries  will  never  specify  the  entire  context  such  as  all 
the  seven  clauses  in  the  example  of  Section  6.2.  That  is  why  the  entire 
paragraph-sentence  has  a  name,  Name  1.  Special  parts  of  a  clause  may  be 
designated  for  verification.  A  typical  query  might  be:  "In  Name  1  is  it 
•true  that:  Name  6  is  to  the  left  of  Name  5."  With  a  slight  variation  of 
the  beginning  we  can  get:  "In  Name  1  find  ?  such  that:  Name  6  is  to  the 
left  of  ?'\ 

Similarly,  the  answer  to  such  a  query  need  not  produce  unwanted 
(e.g.,  irrelevant  to  the  query)  statements  of  Name  1  It  could  be  a  simple 
"Yes,  in  Name  1  it  is  true  that:  Name  6  is  to  the  left  of  Name  5".  't  "In 
Name  1 .  Name  5  is  such  that  Name  6  is  to  the  left  of  it". 

In  this  section  we  try  merely  to  relate  and  to  the  corre¬ 
sponding  sublanguages  in  and  L^.  We  will  elsewhere  develop  a  more 
general  query  language  together  with  algorithms  to  translate  linguistic 
queries  of  a  deeper  sort  directly  into  efficient  tree -searching  programs. 

Consider,  first,  rules  for  a  system  which  are: 

NjQI:  ‘  QPGSENT  -*•  QPRE  +  SPEC. 

NjQ2:  QPRE  •>  QTR  +  NI  +  CONS 

NjQ3:  QTR  •*  is  it  true  that 


i 


The  remaining  rules  are  ones  introduced  earlier,  namely 


4 


ND20:  CONS  consists  of; 

ND15  for  NX,  NC4-22  for  SPEC. 

He  can  construct  a  pictorial  query  from  a  parsed  query  sentence  in 
this  language  by  proceeding  as  in  CNI-CNG,  plus  inserting  ?  into  each  □  , 
O  ,  and  A  ,  and  instead  cf  drawing  a  wavy- line  frame  as  stated  in  CNI  we 
draw  a  curly-line  frame-  This  proves: 

Theorem  6.5:  To  every  query  constructed  according  to  Sj(L^)  corresponds 
a  pictorial  answer  in  L^. 


This  answer  will  be  a  frame  with  curly- lines  with  a  curly- line  frame 
around  each  object. 

By  an  answer  in  the  system  Sj 
"It  is  true  that  Name  1  consists  of: 

NjAl:  APGSENT  ■+  APRE  +  SPEC 

NjA2:  APRE  -  ATR  +  NI  +  CONS 

NjA3:  ATR  •+  It  is  true  that 


<V 


we  mean  a  sentence  of  the  form: 


The  remaining  rules  are  as  in  Sj(L^).  To  construct  a  pictorial 
answer  from  any  such  answer-sentence,  draw  a  curly- line  frame  around  each 
□  ,  O  and  A  .  By  parsing  a  pictorial  answer,  then  removing  the  curly-line 
frames  and  applying  rules  N^Al,  N^A2,  in  reverse,  we  can  construct  an 
answer-sentence  in  L^.  Thus  we  have: 


Theorem  6.7:  For  every  query  constructed  according  to  Sj(I^q),  there  is 
an  appropriate  answer  constructed  according  to  Sj(L^). 
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System  Sj(L^q)  Is  limited.  The  rules  o£  S2(I^q)  tret 
H2Q1:  QSENI  -  QIHTR  +  QSPEC 

N2Q2:  QIHTR  -*•  IN  +  NI  +  QTR2 

N2Q3:  IN  -  in 

N2Q4:  QTR2  •*>  is  it  true  that: 

N2Q5:  QAPEC  -*•  NI  +  SMPR  +  NI 

N2Q6:  SMPR  ♦  IS  +  SM  +  NREL 

N2Q7:  SM  -*  somewhere 


The  rules  of  NI,  IS,  NREL  are  as  stated  before.  The  sentence  "In 
Name  1  is  it  true  that:  Name . 2  is  somewhere  to  the  right  of  Name . 3"  is  a 
typical  product  of  these  rules.  We  have  not  tried  to  enrich  this  query 
language  by  even  allowing  questions  about  shape;  we  wish  merely  to  relate 
this  query  language  to  the  language  developed  in  Section  IV.  To  do  this, 
we  replace  the  trees  beginning  with  Sh  ,  as  the  terminal  nodes  of  a  Q-tree, 
by  proper  names . 

Theorem  6.8:  To  each  query-sentence  formed  according  to  S„  corre¬ 

sponds  a  query- tree  ln  lgp* 

Proof:  Suppose  that  the  parsing  tree  of  a  sentence  in  %  18  8iven’  At 

I  \  /  \  .  \  / 

node  QSENT(N2Q1),  form  q/ \  •  At  node  SMPR(N2Q6)  attach  ^  ^  ^ 

to  the  Q-graph,  depending  on  which  of  the  rules  of  ND23  are  applied  at  node 

NREL.  Attach  the  names  specified  at  node  QSPEC(N2Q5)  in  proper  order  to 

complete  the  Q-tree.  We  will  also  attach  the  name  of  the  figure  to  be 

search  next  to  Q  on  the  Q-tree,  when  we  reach  node  QINTR  after  applying 

rule  N2Q2  in  the  parsing  tree.  Tie  result  is  a  tree  with  the  two  mentioned 

modifications,  and  thus  as  a  tree  of  the  extended  tree-query  language  L,^ 

defined  previously  (Section  IV). 
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Xbt9Tgl  *,9:  10  **ch  sentence  formed  according  to  S^L^)  there 

is  a  pictorial  answer  in  L„_. 

Gr 

lEOO£S  First  form  the  Q-tree  in  1^  according  to  the  preceding  theorem. 
Next,  process  the  Q-tree  according  to  the  algorithm  of  Section  IV,  search 
the  corpus  specified  by  the  name  next  to  Q.  In  the  present  extension  of 
our  various  means  of  representation,  we  suppose  that  each  corpus  that  can  be 
searched  separately  is  given  a  name,  and  that  name  is  recorded  with  it. 
Similarly,  all  objects  are  named  and  names  are  recorded  with  them.  In 
testing  for  match,  the  recorded  names  must  coincide  with  names  specified 
in  the  query.  From  the  matching  trees  in  the  corpus,  pictorial  answers 
may  be  formed  by  the  procedure  indicated  TQ&1-TQS4. 


We  can  now  construct  an  system  S^)  analogously  to  S^) 
and  show: 

Ihvoyem  6.]0:  For  every  query  constructed  according  to  S2(I^Q),  there 
is  an  appropriate  answer  constructed  according  to 
Siaot:  The  rules  of  S2(L^)  are: 

N2A1:  ASENT  -*•  AINTR  +  ASPEC 

N2A2:  AINTR  +  IN  +  NI  +  ATR2 

n2A3:  ATR2  -*■  it  is  true  that: 


all- other  rules  are  as  in  S 


2(LNQ^  ' 


We  form  a  sentence  of  according  to  these  rules  from  a  pictorial 
answer  by  parsing  the  pictorial  answer,  removing  the  curly-line  frames  and 
applying  N^l,  N2A2,  N^,  etc.,  in  reverse.  Thus,  given  a  query,  wc  form 
the  corresponding  Q-tree,  process  it  to  produce  a  pictorial  answer,  then 
describe  the  latter  as  a  sentence  in  . 
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We  conclude  by  Introducing  queries  with  question  marks .  Consider 
the  system  S^L^): 

QUERY  -*•  QNMFD  +  QVSPEC 
QNMFD  -*■  IN  +  NI  +  FDST 
FIND  find  ?  such  that: 

QVSPEC  -*•  VI  +  SMPR  +  VI 
VI  -*•  NI  ,  ? 

All  other  rules,  for  SMPR,  are  as  before.  The  main  difference  is 
that  we  can  use  ?  in  place  of  proper  names.  We  transform  such  queries  int>- 
Q-trees  modified  in  that  names  are  attached  to  the  terminal  nodes  and  re 
the  Q-'node.  We  the*1  proceed  as  we  did  for  t0  produce  pictorial 

answers.  We  construct  answer-sentences  from  such  pictorial  answers 
according  to  rules  of 


N3Q1: 

N3Q2: 

N3Q3: 

N3Q4: 

N3Q4: 


N3A1: 

AVERY  +  ANMFD  +  AVSPEC 

N/2: 

ANMFD  -*•  IN  +  NI  +  NMST 

K3A3: 

1MST  COM  +  NIST 

N3A4: 

NIST  -*■  NI-  +  IS  +  ST 

N3A5: 

ST  such  that 

N3A6: 

CCM  -*•  , 

N^A?: 

AVSPEC  ■*>  AI  +  SMPR  +  AI 

¥8: 

AI  -»  NI,  IT 

N3A9: 

IT  it. 

The  rules  for  NI,  SMPR,  IS,  IN,  etc.,  are 

as  before.  In  forming  an 

answei 

in  L^,  these  rules  arc  used  in  reverse. 

The  rule  AI  -*•  11  is  used 

to  pi.1 

cc  "it"  where  a  question  mark  appeared  in 

the  query,  and  where  the 

name  which  is  now  introduced  in  rule  N.jA4  appeared  in  a  matching  picture. 
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The  sentence:  "In  Name . 1 ,  Name  5  Is  such  that  Name. 2  is  to  the  left  of  it" 
is  in  L^.  It  corresponds  to  the  query  "In  Name.  1  find  ?  such  that:  Name. 2 
is  to  the  left  of  ?". 

Theorem  6.11:  To  every  query  constructed  according  to 
appropriate  answer  constructed  according  to 


(L^),  there  is  an 


VII.  CONCLUDING  COMMENTS 


We  have  shown  how  to  construct  descriptive,  constructive,  Interrogative 
and  responsive  languages  in  graphic,  tree  and  English-like  representations 
for  a  very  simple  domain  of  discourse.  We  have  shown  how  to  connect  these 
various  sub -languages.  We  have  seen  that  a  tree  representation  has  advantages 
over  the  other  two  means  of  representation  for  automated  storage  and  search 
of  the  kind  of  data  considered;  that  an  English-like  representation  has 
advantages  for  posing  queries;  that  a  graphic  representation  has  advantages 
for  displaying  answers. 

We  have  not  yet  shown  how  far  we  can  extend  the  English-like  language 
toward  ordinary  English  to  pose  a  greater  variety  of  queries  of  the  same 
data;  nor  have  we  as  yet  shown  how  to  translate  directly  from  English-like 
queries  into  efficient  computer  search  programs.  Some  of  our  work  in  this 
area  will  be  presented  in  a  forthcoming  paper.  Also,  we  have  not  paid  any 
attention  to  the  important  problem  of  how  to  extend  these  ideas  to  more 
complex,  more  varied,  more  realistic  types  of  data,  and  how  to  automatically 
form  the  language  system  as  the  data  base  expands.  These  questions  are 
currently  under  study. 
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In  this  paper  we  discuss  means  of  representing  states  of  the  world 
which  are  easily  described  as  pictures  of  triangles,  circles,  and  squares 
in  horizontal,  vertical,  or  enclosure  relationships;  our  study  is  oriented 
to  the  comparative  evaluation  of  different  representations  for  computer- 
based  question-answering  systems. 

Three  languages  for  representing  such  pictorial  data  are  constructed. 
The  basic  units  of  the  first  are  pictures,  of  the  second  trees,  and  of  the 
third  sentt  'ccs.  Each  of  the  three  languages  is  further  modified  to  serve 
for  describing  data,  for  specifying  constructions,  for  posing  queries,  and 
for  stating  answers.  The  interrelations  among  the  various  specialized  uses 
o!  these  three  languages  arc  investigated.  Queries  arc  best  posed  in  an 
hngl  is.-. - 1  ike  language,  computer  search  besf  proceeds  on  dar*  represented  as 
trees,  and  answers  car.  often  be  best  presented  in  picture  representations. 
Results  are  in  the  £ona  of  a>  context-free  generative  grasnars  for  the 
ditiervn;  languages  expressed  as  production  rules,  b)  theorems  showing 
cor : espondences  between,  say,  all  query  sentences  and  ail  pictorial  answers, 
and  c)  formula  for  the  effort  to  search  for  answers,  for  optimal  trees  to 
store  data. 
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